Picture for Zhuoyang Zhang

Zhuoyang Zhang

Grounded 3D-Aware Spatial Vision-Language Modeling

Add code
May 28, 2026
Viaarxiv icon

JetViT: Efficient High-Resolution Vision Transformer with Post-Training Attention Search

Add code
May 26, 2026
Viaarxiv icon

Hide to Guide: Learning via Semantic Masking

Add code
May 24, 2026
Viaarxiv icon

Stable Asynchrony: Variance-Controlled Off-Policy RL for LLMs

Add code
Feb 19, 2026
Viaarxiv icon

ForeAct: Steering Your VLA with Efficient Visual Foresight Planning

Add code
Feb 12, 2026
Viaarxiv icon

Quant VideoGen: Auto-Regressive Long Video Generation via 2-Bit KV-Cache Quantization

Add code
Feb 03, 2026
Viaarxiv icon

Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation

Add code
Jul 02, 2025
Figure 1 for Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation
Figure 2 for Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation
Figure 3 for Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation
Figure 4 for Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation
Viaarxiv icon

CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models

Add code
Mar 27, 2025
Figure 1 for CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models
Figure 2 for CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models
Figure 3 for CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models
Figure 4 for CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models
Viaarxiv icon

NVILA: Efficient Frontier Visual Language Models

Add code
Dec 05, 2024
Figure 1 for NVILA: Efficient Frontier Visual Language Models
Figure 2 for NVILA: Efficient Frontier Visual Language Models
Figure 3 for NVILA: Efficient Frontier Visual Language Models
Figure 4 for NVILA: Efficient Frontier Visual Language Models
Viaarxiv icon

HART: Efficient Visual Generation with Hybrid Autoregressive Transformer

Add code
Oct 14, 2024
Figure 1 for HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
Figure 2 for HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
Figure 3 for HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
Figure 4 for HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
Viaarxiv icon